Efficient algorithms for computing rank‐revealing factorizations on a GPU
نویسندگان
چکیده
Standard rank-revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is inability of standard algorithms cast most operations terms Level-3 BLAS. This article presents two alternative for computing form = U T V ∗ $$ \mathbf{\mathsf{A}}=\mathbf{\mathsf{UT}}{\mathbf{\mathsf{V}}}^{\ast } , where \mathbf{\mathsf{U}} \mathbf{\mathsf{V}} orthogonal \mathbf{\mathsf{T}} trapezoidal (or triangular if \mathbf{\mathsf{A}} square). Both use randomized projection techniques flops matrix-matrix multiplication, which exceptionally efficient Numerical experiments illustrate that these achieve significant acceleration over finely tuned GPU implementations SVD while providing low rank approximation errors close SVD.
منابع مشابه
Efficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملefficient data mining with evolutionary algorithms for cloud computing application
with the rapid development of the internet, the amount of information and data which are produced, are extremely massive. hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. data mining can overcome this problem. while data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. as the speed of ...
متن کاملProvably Efficient GPU Algorithms
In this paper we present an abstract model for algorithm design on GPUs by extending the parallel external memory (PEM) model with computations in internal memory (commonly known as shared memory in GPU literature) defined in the presence of memory banks and bank conflicts. We also present a framework for designing bank conflict free algorithms on GPUs. Using our framework we develop the first ...
متن کاملI/O-Efficient Algorithms for Computing Contours on a Terrain
A terrain M is the graph of a bivariate function. We assume that M is represented as a triangulated surface with N vertices. A contour (or isoline) of M is a connected component of a level set of M. Generically, each contour is a closed polygonal curve; at “critical” levels these curves may touch each other or collapse to a point. We present I/Oefficient algorithms for the following two problem...
متن کاملGPU-Vote: A Framework for Accelerating Voting Algorithms on GPU
Voting algorithms, such as histogram and Hough transforms, are frequently used algorithms in various domains, such as statistics and image processing. Algorithms in these domains may be accelerated using GPUs. Implementing voting algorithms efficiently on a GPU however is far from trivial due to irregularities and unpredictable memory accesses. Existing GPU implementations therefore target only...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Numerical Linear Algebra With Applications
سال: 2023
ISSN: ['1070-5325', '1099-1506']
DOI: https://doi.org/10.1002/nla.2515